Add AggressiveInlining to EnsureInitializedCore method to allow delegates stack alloc#122983
Add AggressiveInlining to EnsureInitializedCore method to allow delegates stack alloc#122983SergeyTeplyakov wants to merge 4 commits intodotnet:mainfrom
Conversation
…ates stack alloc
Currently stack allocation (and overall de-abstraction) works only when JIT can has a full understanding of the code. And due to lack of cross-prodcedural analysis it means that for the optimization to work the operations should be inlined.
The same is true for `LazyInitializer.EnsureInitialized` that takes a delegate. Currently, because `EnsureInitializedCore` quite big and can't be inlined by default, JIT can't stack allocate the delegate passed to the method.
The following benchmark shows the impact of this change:
```csharp
[MemoryDiagnoser]
[ShortRunJob(RuntimeMoniker.Net90)]
[ShortRunJob(RuntimeMoniker.Net10_0)]
[CategoriesColumn]
[HideColumns(Column.Job, Column.Error, Column.Median, Column.RatioSD, Column.Ratio, Column.AllocRatio, Column.StdDev, Column.Gen0)]
public class BenchmarkingDelegates
{
[Benchmark]
public string EnsureInitialized_Old()
{
string defaultValue = "defaultValue";
return LazyInitializer.EnsureInitialized(ref _cachedValue, () => defaultValue);
}
[Benchmark]
public string EnsureInitialized_New()
{
string defaultValue = "defaultValue";
return MyLazyInitializer.EnsureInitialized(ref _cachedValue, () => defaultValue);
}
```
The output:
```csharp
| Method | Runtime | Mean | Allocated |
|---------------------- |---------- |---------:|----------:|
| EnsureInitialized_Old | .NET 10.0 | 5.374 ns | 88 B |
| EnsureInitialized_New | .NET 10.0 | 1.951 ns | 24 B |
| EnsureInitialized_Old | .NET 9.0 | 6.194 ns | 88 B |
| EnsureInitialized_New | .NET 9.0 | 6.041 ns | 88 B |
```
The `MyLazyInitializer` is the copy of the existing one with `AggressiveInlining` flag set.
| [MethodImpl(MethodImplOptions.AggressiveInlining)] | ||
| private static T EnsureInitializedCore<T>([NotNull] ref T? target, Func<T> valueFactory) where T : class | ||
| { | ||
| T value = valueFactory() ?? throw new InvalidOperationException(SR.Lazy_StaticInit_InvalidOperation); |
There was a problem hiding this comment.
Moving throw new InvalidOperationException(SR.Lazy_StaticInit_InvalidOperation); to a throw-helper should also work, probably it's a better approach
There was a problem hiding this comment.
I've tried just moving it to a helper without AggressiveInlining, but it was not enough. The AggressiveInlining was still required in this case. And because of that, I've decided not to move the 'throw' into a helper.
There was a problem hiding this comment.
hm.. then it's fine as is, but the inliner needs some tuning apparently
There was a problem hiding this comment.
There are more EnsureInitializedCore overloads in this file, any reason why only this one needs AggressiveInlining?
There was a problem hiding this comment.
There are 2 more overloads that takes Func:
public static T EnsureInitialized<T>([NotNull] ref T? target, [NotNullIfNotNull(nameof(syncLock))] ref object? syncLock, Func<T> valueFactory) where T : class;
public static T EnsureInitialized<T>([AllowNull] ref T target, ref bool initialized, [NotNullIfNotNull(nameof(syncLock))] ref object? syncLock, Func<T> valueFactory);I've tried adding AggressiveInlining to both of them (to EnsureInitializedCore methods), and here are the results:
| Method | Runtime | Mean | Allocated |
|------------------------------------------- |---------- |----------:|----------:|
| EnsureInitialized_Old | .NET 10.0 | 5.873 ns | 88 B |
| EnsureInitialized_New | .NET 10.0 | 1.744 ns | 24 B |
| EnsureInitialized_SyncRoot_Old | .NET 10.0 | 5.729 ns | 88 B |
| EnsureInitialized_SyncRoot_New | .NET 10.0 | 5.692 ns | 88 B |
| EnsureInitialized_Initialized_SyncRoot_Old | .NET 10.0 | 18.475 ns | 24 B |
| EnsureInitialized_Initialized_SyncRoot_New | .NET 10.0 | 18.614 ns | 24 B |
| EnsureInitialized_Old | .NET 9.0 | 6.522 ns | 88 B |
| EnsureInitialized_New | .NET 9.0 | 6.561 ns | 88 B |
| EnsureInitialized_SyncRoot_Old | .NET 9.0 | 6.566 ns | 88 B |
| EnsureInitialized_SyncRoot_New | .NET 9.0 | 6.485 ns | 88 B |
| EnsureInitialized_Initialized_SyncRoot_Old | .NET 9.0 | 22.287 ns | 88 B |
| EnsureInitialized_Initialized_SyncRoot_New | .NET 9.0 | 19.889 ns | 88 B |
So in both cases the AggressiveInlining has no effect. For the first case, the JIT can't inline the method even with AggressiveInlining attribute so we still have 88B of allocations (for both the delegate and the closure), and for the second overload the JIT can inline the old version, so in both cases we have only the delegate allocation.
So I can add the attribute for consistency, but it seems that it has no effect currently for other 2 cases.
There was a problem hiding this comment.
Can you share what specifically you tried with regards to outlining the throw? With that this attr really shouldn't be necessary, and if it is, that suggests a broader inlining issue we should fix instead.
There was a problem hiding this comment.
Pushed another commit with the proposed change and trigerred the EgorBot to see the before and after results.
Running this version locally was not enough to make the Core method inlined by the JIT.
|
@EgorBot -arm -amd using System.Threading;
using BenchmarkDotNet.Attributes;
[MemoryDiagnoser]
public class Benchmark
{
static string _cachedValue = null;
[Benchmark]
public string EnsureInitialized_Old()
{
string defaultValue = "defaultValue";
return LazyInitializer.EnsureInitialized(ref _cachedValue, () => defaultValue);
}
} |
There was a problem hiding this comment.
Pull request overview
This PR adds the AggressiveInlining attribute to the EnsureInitializedCore<T> method in LazyInitializer to enable JIT compiler optimizations, specifically delegate stack allocation. Benchmark results show significant performance improvements in .NET 10.0: execution time reduced from 5.374ns to 1.951ns, and allocations decreased from 88B to 24B.
Key Changes:
- Added
[MethodImpl(MethodImplOptions.AggressiveInlining)]attribute to theEnsureInitializedCore<T>method that takes aFunc<T>parameter - Added the required
using System.Runtime.CompilerServices;directive
|
Tagging @stephentoub for an extra pair of eyes on this change. |
Checking if extracting `throw` helps JIT to inline the method without using `MethodImpl` attribute.
|
@EgorBot -arm -amd |
| T value = valueFactory() ?? Throw(); | ||
| Interlocked.CompareExchange(ref target, value, null!); | ||
| Debug.Assert(target != null); | ||
| return target; | ||
|
|
||
| static T Throw() => throw new InvalidOperationException(SR.Lazy_StaticInit_InvalidOperation); |
There was a problem hiding this comment.
| T value = valueFactory() ?? Throw(); | |
| Interlocked.CompareExchange(ref target, value, null!); | |
| Debug.Assert(target != null); | |
| return target; | |
| static T Throw() => throw new InvalidOperationException(SR.Lazy_StaticInit_InvalidOperation); | |
| T? value = valueFactory(); | |
| if (value is null) | |
| { | |
| Throw(); | |
| } | |
| Interlocked.CompareExchange(ref target, value, null!); | |
| Debug.Assert(target != null); | |
| return target; | |
| [DoesNotReturn] | |
| static void Throw() => throw new InvalidOperationException(SR.Lazy_StaticInit_InvalidOperation); |
There was a problem hiding this comment.
Happy to take this one. I've tried this version as well, and it was not inlined. My idea was that the generic helper was problematic here, so I've changed it to a void, but then changed it to generic in this PR.
Once the EgorBot will finish running my version, I'll accept this change to run it again.
There was a problem hiding this comment.
@SergeyTeplyakov @stephentoub this highlights a problem we have in JIT + PGO. We do make inliner more aggressive in hot blocks by making it less aggressive in cold blocks. Here
public static T EnsureInitialized<T>([NotNull] ref T? target, Func<T> valueFactory) where T : class
{
var read = Volatile.Read(ref target!);
if (read != null)
{
return read;
}
return EnsureInitializedCore(ref target, valueFactory);
}EnsureInitializedCore is basically a cold block, so inliner decides to give up on it. In most cases this is the right thing to do, but it does mess with the escape analysis at times. We had many other places similar to this with the same issue, until we fix it (by making inliner more sophisticated or implement inter-procedural analysis, the AggressiveInlining is the only way to go 😞). cc @AndyAyersMS
you can validate my theory by setting target to point to null so the cold path is always executed (I already checked locally).
There was a problem hiding this comment.
Oh, wait, this is EnsureInitiskized_Core_. We don't want that to be able inlined; it's very explicitly separated out as we don't want it cluttering up call sites.
There was a problem hiding this comment.
A method is used a lot, but the delegate can't be stack-allocated due to other reasons.
A super common case is a delegate that's a cached singleton because it's static. By forcing inlining, we're making the whole thing larger, which then negatively impacts the possibility of its callers being inlined / makes all such consumers larger... standard negative impact of inlining things that shouldn't be inlined.
There was a problem hiding this comment.
@EgorBo Would it be possible to teach the JIT to move the delegate allocation into the cold if instead?
Maybe.
We should be able to see that the delegate is partially dead and (with PGO) likely to be dead. If so we could sink its allocation and construction like you say. We haven't really looked into sinking partially dead code. It might be doable, though proving it is safe and moving all the right pieces might be hard.
Or if we had partial escape analysis... we'd see that the delegate was unlikely to escape and so we'd stack allocate it initially and then move it to the heap if needed.
There was a problem hiding this comment.
@stephentoub I did have concerns that this change might not be useful given all the trade offs and it seems that from your perspective its not worth taking it. Right?
If we don't see the path forward, I'm ok abandoning it.
There was a problem hiding this comment.
I appreciate the effort, but without a more comprehensive look at what this could negatively impact, I don't think we should take it.
…Initializer.cs Co-authored-by: Stephen Toub <stoub@microsoft.com>
|
@EgorBot -arm -amd using System.Threading;
using BenchmarkDotNet.Attributes;
[MemoryDiagnoser]
public class Benchmark
{
static string _cachedValue = null;
[Benchmark]
public string EnsureInitialized_Old()
{
string defaultValue = "defaultValue";
return LazyInitializer.EnsureInitialized(ref _cachedValue, () => defaultValue);
}
} |
| T? value = valueFactory(); | ||
| if (value is null) | ||
| { | ||
| Throw(); | ||
| } | ||
|
|
There was a problem hiding this comment.
| T? value = valueFactory(); | |
| if (value is null) | |
| { | |
| Throw(); | |
| } | |
| if (valueFactory() is not { } value) | |
| { | |
| Throw(); | |
| } |
😄
There was a problem hiding this comment.
This won't help to make the method more inlinable. So this is just a matter of style.
(Btw, my take is that don't really like the new version, maybe my brain didn't re-wire itself to parse it) :)
There was a problem hiding this comment.
Yes!
One less line of code.
Doesn't necessarily make it faster to compile either.
😄
|
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
|
Closing per #122983 (comment). Thanks. |

Currently stack allocation (and overall de-abstraction) works only when JIT can has a full understanding of the code. And due to lack of cross-prodcedural analysis it means that for the optimization to work the operations should be inlined.
The same is true for
LazyInitializer.EnsureInitializedthat takes a delegate. Currently, becauseEnsureInitializedCorequite big and can't be inlined by default, JIT can't stack allocate the delegate passed to the method.The following benchmark shows the impact of this change:
The output:
The
MyLazyInitializeris the copy of the existing one withAggressiveInliningflag set.